Indexing HDFS Data in PDW: Splitting the data from the index
نویسندگان
چکیده
There is a growing interest in making relational DBMSs work synergistically with MapReduce systems. However, there are interesting technical challenges associated with figuring out the right balance between the use and co-deployment of these systems. This paper focuses on one specific aspect of this balance, namely how to leverage the superior indexing and query processing power of a relational DBMS for data that is often more cost-effectively stored in Hadoop/HDFS. We present a method to use conventional B+-tree indices in an RDBMS for data stored in HDFS and demonstrate that our approach is especially effective for highly selective queries.
منابع مشابه
Towards Zero-Overhead Adaptive Indexing in Hadoop
Several research works have focused on supporting index access in MapReduce systems. These works have allowed users to significantly speed up selective MapReduce jobs by orders of magnitude. However, all these proposals require users to create indexes upfront, which might be a difficult task in certain applications (such as in scientific and social applications) where workloads are evolving or ...
متن کاملDoes the Platelet Index Have a Guiding Role in the Association of Cancer and Pulmonary Thromboembolism?
Introduction: The diagnostic value of the D-dimer test varies with variable platelet numbers and functions in patients suffering from cancer and concomitant pulmonary thromboembolism (PTE). This requires easy and reliable evaluation tests. In this study, we aimed to investigate the hypothesis that platelet functions may be more guiding in the prediction and diagnosis of PTE rather than the numb...
متن کاملDesign Architecture-Based on Web Server and Application Cluster in Cloud Environment
Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the work of searching and fetching from thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The major idea is to design a w...
متن کاملDesign Anefficient Bigdata Analytic Architecture Forretrieval Ofdatabased on Web Server Incloudenvironment
Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the work of searching and fetching from thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The major idea is to design a w...
متن کاملDoes the Platelet Index Have a Guiding Role in the Association of Cancer and Pulmonary Thromboembolism?
Introduction: The diagnostic value of the D-dimer test varies with variable platelet numbers and functions in patients suffering from cancer and concomitant pulmonary thromboembolism (PTE). This requires easy and reliable evaluation tests. In this study, we aimed to investigate the hypothesis that platelet functions may be more guiding in the prediction and diagnosis of PTE rather than the numb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 7 شماره
صفحات -
تاریخ انتشار 2014